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The crystallins have relatively high refractive increments compared to other proteins. The Greek key motif 
in py-crystallins was compared with that in other proteins, using predictive analysis from a protein database, 
to see whether this may be related to the refractive increment. Crystallins with Greek keys motifs have 
significantly higher refractive increments and more salt bridges than other proteins with Greek key 
domains. Specific amino acid substitutions: lysine and glutamic acid residues are replaced by arginine and 
aspartic acid, respectively as refractive increment increases. These trends are also seen in S-crystallins 
suggesting that the primary sequence of crystallins may be specifically enriched with amino acids with 
appropriate values of refractive increment to meet optical requirements. Comparison of crystallins from five 
species: two aquatic and three terrestrial shows that the lysine/arginine correlation with refractive increment 
occurs in all species investigated. This may be linked with formation and maintenance of salt bridges. 

The ability of the visual system to detect and interpret information from light is an evolutionary adaptation 
following an ancient divergence from a primitive light sensing structure that has resulted in species specific 
systems with varying degrees of sophistication yet remarkably similar configurations' \ The eye lens is a key 
refractive element that has been tailored for each organism in order to meet visual demands required for 
functional needs' '. It is composed of lens fibre cells that grow in concentric layers over existing tissue, with no 
concomitant cellular loss; the tissue accrual is a process that begins in utero and continues through life' The 
fibre cells are differentiated from epithelial cells located under the anterior part of the lens capsule, which is a 
semi-elastic basement membrane that contains the lens and transfers the forces needed to adjust its shape with 
changes in focussing power\ During the differentiation process, the structural proteins of the lens, the crystallins, 
are synthesised in concentrations that vary across the tissue in order to create a refractive index gradient required 
for optimising image quality'. Cytosolic concentrations can approach —400 mg/ml in humans'*'^ In the ter- 
minal step of differentiation the nucleus become pyknotic and the cell enters proteostasis, retaining the requisite 
concentration and mfacture of crystallins'*'^''. The lens, therefore, retains within its refractive index gradient, a 
chronology of ageing with newly synthesised proteins in peripheral fibre cells and proteins produced during 
gestation in its centre evolved for longevity and for maintenance of optical function''''''*. 

The major function of the lens is to maintain transparency and to provide sufficient refractive power for light to 
focus on the retina. In terrestrial species, around two-thirds of the refraction occurs at the air/cornea interface 
with the lens providing the additional refractive power and, in species with sufficiently malleable lenses, the fine- 
tuning required to adjust lens shape for different viewing distances^'''. In aquatic species, the comparatively high 
refractive index of water negates the refractive power of the cornea and has resulted in the evolution of lenses with 
steep index gradients and high refractive index magnitudes required to provide aU or most of the refractive power 
for the aquatic eye"'''"'". 

The refractive index is related to protein concentration by the Gladstone-Dale formula' '^". This simple linear 
equation introduces the refractive index increment (dn/dc), which defines how much a given concentration of a 
protein will contribute to the refractive index' '^ Across the crystallin isoforms differences in dn/dc values have 
been found, with the smallest of the crystallin classes: the y-crystaUins, having the highest dn/dc'"*''*. This is also 
the crystallin class that is found in the core of the lens''"''" where refractive index is the highest''"*'". 

It is thought that the requirement for longevity, which is of particular importance in the core of the lens that 
contains the oldest cells, has been met by the Greek key structure, which is known for its thermodynamic 
stability'*'^""^' and is a feature of y-crystallins and found in the larger Py-crystallin famUy"'^*'^'. Higher order 
structures as well as the dn/dc of any given protein are influenced by the primary sequence of amino acids. 
McMeekin et al measured the dn/dc of each amino acid for 589 nm and at 25°C^'' taking into account individual 
amino acid refractivities and partial specific volume. These values were recently used by Zhao et al'''" to compute 
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the dn/dc values for an array of proteins. Zhao et al" '^ reported the 
enrichment of aromatic residues, associated with high dn/dc, in len- 
ticular crystallins, particularly in the Py isoforms. 

Whether there is a correlation between the Greek key structure 
and primary sequences containing amino acid residues with high 
refractivities in all proteins or whether the Greek key motifs and 
amino acids with higher refractivities are found in certain crystallin 
isoforms was not known. This work suggests that Py crystallin iso- 
forms have comparatively high dn/dc values compared to other pro- 
teins with Greek key structures. 

Results 

Specific refractive increments of Greek key domains. Figure 1 
shows the dn/dc distribution of the Py-crystallin dataset. The 
mean dn/dc of the Py-crystallin dataset is 0.1999 ml/g (SD: 
0.0036 ml/g). The mean dn/dc of the double Greek key motif of 
crystallins was found to be 0.2003 ml/g (SD: 0.0035 ml/g). Plotting 
predicted dn/dc against sequence length for the 292 entries of the Py- 
crystallin dataset shows a general trend to decreasing dn/dc with 
longer sequences (Figure 2A). These observations may suggest a 
link between the double Greek key motif and high dn/dc. It is clear 
from Figure 2B, which highlights the range of sequence lengths 
between 50 and 300 residues, that there is a predominance of 
native proteins with lengths of between 170-180 residues, which 
corresponds to the double Greek key motif 

The dn/dc of representatives of double Greek key domains from a 
wide range of proteins were analysed to establish if high dn/dc is a 
consequence of the motif structure. The distribution of dn/dc for 
representatives of 52 different double Greek key domains^^ indicates 
that Greek keys from Py-crystallin proteins have the highest dn/dc 
values, appearing as an outlier as seen in Figure 3A. The mean dn/dc 
value for double Greek keys motifs is 0.1908 ml/g which is close to 
the mean for human proteins, ie 0.1899 ml/g, but substantially lower 
than that of py- crystallin (0.2009 ml/g). No correlation is seen 
between either the sequence length or the number of strands of the 
double Greek key domain and the predicted dn/dc value (Figure 3B 
and 3C). 

Amino acid compositions and refractive increments in crystallin 
isoforms. In order to investigate the specificity of the Py-crystallin 
double Greek key domain, a subset of the initial dataset was created 
containing only proteins within the 170-180 residue range. This 
comprises 116 Py-crystallin isoforms with minimum and 
maximum dn/dc values of 0.1965 and 0.2084 ml/g respectively and 
a mean of 0.2024 ml/g. Their amino acid compositions show an 



inverse correlation between the proportion of arginine and lysine 
residues and the dn/dc value (r^ = —0.85, p < 0.0005): the sum of 
arginine and lysine residues is constant at —21 but as the dn/dc value 
increases, the number of lysine residues decreases and that of 
arginine increases concomitantly (Figure 4A). A similar inverse 
correlation is seen with glutamic acid being replaced by aspartic 
acid (r^ = -0.70, p < 0.0005) (Figure 4B). In both cases, an 
increase in charged amino acids with high dn/dc values is at the 
expense of those with lower dn/dc values, leading to a higher 
overall dn/dc value for the given protein. A similar analysis was 
conducted on the S-crystallin family, which are the major protein 
class in eye lenses of cephalopods (eg. octopi, squid, cuttlefish). As 
illustrated in Figure 4C and Figure 4D, inverse correlations between 
the proportion of arginine and lysine residues, and glutamic acid and 
aspartic acid are also observed in this family of proteins (r^ = — 0.92, 
p < 0.0005 and r^ = -0.81, p < 0.0005, respectively). Notably, S- 
crystallins from the octopus, which are demarcated by open circles 
(Figure 4C), differ in composition from other S-crystallin sequences. 
This causes the deviations between the curves representing K and R 
residues but does not affect the strength of the correlation between 
them. 

Interspecies comparison. Amino acid sequences from a range of 
crystallins with different dn/dc values were compared in five 
species to examine whether these specific correlations, more lysine 
and glutamic acid in crystallins with relatively low dn/dc values and 
more arginine and aspartic acid in crystallins with relatively high dn/ 
dc values, were consistent across species. Comparison was made 
between two aquatic and three terrestrial species: ranine (Xenopus 
laevis), piscine (Danio rerio), murine (Mus musculus), bovine (Bos 
taurus), and human {Homo sapien) (Table 1). The greatest variation 
in dn/dc values is seen in the zebrafish {Danio rerio) and the least in 
the human. The results show that a number of lysine residues in 
sequences of lower dn/dc values were substituted by arginine 
in sequences with higher dn/dc values (Table 2). There are no such 
consistent correlations between glutamic acid residues and aspartic 
acid residues. Rather, Table 2 also shows that between the protein 
with the lowest dn/dc and those with the highest dn/dc in any given 
species, there is a consistent substitution of phenylalanine by 
tyrosine. In contrast with the other set of substitutions, this creates 
a very slight decrease in dn/dc. 

Salt bridge analysis. Analysis of representative Py-crystallin struc- 
tures indicate that mammalian Py-crystaUins have around 5 salt 
bridges per domain length (of around 90 residues); this is not 
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Figure 1 | Frequency distribution of predicted refractive index increments of py crystallins. Green, red and yellow colours indicate the overall average 
value, the average value of double Greek key motif sequences and the average value of other human proteins, respectively. 
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Figure 2 | Predicted refractive index increment plotted against sequence length of py crystallins showing A) all crystallins in the dataset; B) crystaUins 
the sequence length of which is between 50 and 300 amino acids. Green, red and yellow colours indicate the overall average value, the average 
value of double Greek key motif sequences and the average value of other human proteins, respectively. 



found for the non-mammalian proteins. A separation that is 
considerably greater than 10 residues is the most frequent 
separation length (73% of cases) and in cases of separation that are 
fewer than 10 residues, a separation of 2 residues is the most common 
(36% of cases); with 21% of cases showing a separation of 7. Although 
most bridges are formed between amino acids that are > 10 residues 
apart, they are largely from the same domain (84% of cases). Around 
a third of these cases are interstrand bridges. Figure 5A highlights salt 
bridge conservation across five Py-crystallin isoforms, ie yB, PB2, 
PB3, PBl and PA4. Conserved salt bridges connect two Greek keys 
within a domain as well as between two domains (Figure 5B and C). 
The isoforms PB2, PB3, PBl display two cross domain salt bridges; 
P4A has a single salt bridge with a disulphide bond within 4 residues 
of this. The y-crystallin (yB) does not have any cross domain salt 
bridges; these are prevented from forming because of the negatively 
charged residue 29 on one domain and corresponding 147 on the 
other domain. All Py-crystallins have a salt bridge between the two 
Greek keys of the second domain. P4A also displays a salt bridge 
between the two Greek keys of the first domain. 

A fundamental feature incorporated in the Greek key structure is 
the conserved P-hairpin motif*. Calculating the values of dn/dc for 
sequences that comprise the four P-hairpins in each of the isoforms 
shown in Figure 5A indicates that there is a slight increase in the dn/ 
dc value for these sequences compared to the whole sequence for 
each respective isoform. The change in dn/dc value ranges from an 



increase of 0.36% for PB3 (dn/dc = 0.1977 for P-hairpins compared 
with 0.1970 for the whole sequence) to an increase of 3.37% for PB2 
(dn/dc = 0.2004 for P-hairpins compared with 0.1939 for the whole 
sequence). 

Discussion 

The concept of a refractive increment and the property of contrib- 
uting to refraction have obvious relevance to the crystaUins given the 
predominant function of the eye lens. It should be remembered that 
dn/dc is not an immutable property but depends on the solvent^*, 
wavelength^' and, to a much lesser extent, temperature^'. Given that 
experimental crystallin samples prepared for measurement of dn/dc 
need to be constituted in a solvent that replicates fluid found in the 
eye lens, the greatest variability in dn/dc values from different studies 
comes from the wavelength used for measurement. Most experi- 
mental studies on dn/dc of crystallins have used proteins from the 
bovine lens"''"' and some have concentrated on cx-crystaUins" '^ 
and/or y-crystallins'". Where all three broad classes of crystallins 
have been measured, dn/dc was found to be highest for y-crystal- 
lins". Most importantly experimental studies did not isolate particu- 
lar isoforms within the crystallin classes. Theoretical studies that 
have calculated molar refractivities of individual amino acids^' or 
used these to calculate dn/dc were able to compile these values for 
many different proteins'* '^ and provide a deeper insight into the 
reasons why a protein may have a relatively low or high dn/dc value. 
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Figure 3 Frequency distribution A), sequence length B) and number of strands C) plotted against refractive index increments of double Greek key 
domains. The red colour represents Py crystaUins. 



Aromatic (tyrosine, tryptophan and phenylalanine) as well as sul- 
phur containing amino acids (methionine, cysteine) have relatively 
high dn/dc values, whilst alanine, proline and serine have the lowest 
values"''"". The high content of aromatic amino acids in the crystal- 
lins^^'"''' coupled with the relatively high cysteine content of y-crystal- 
lins^*'"'^"^' provides some explanation for the high dn/dc of this 
protein class. The y-crystaUin, yM, a protein found in certain aquatic 
species, has the highest dn/dc (0.209 ml/g) thus far found in a 



crystallin resulting partly from its high level of methionine"''^. The 
high content of methionine may facilitate denser packing'^ which 
would be advantageous for a high refractive index (reviewed in'). 

The mean dn/dc of the Py-crystallin dataset studied (0.1999 ml/g; 
SD: 0.0036 ml/g) is significantly higher than the mean dn/dc for 
other human proteins (mean 0.1899 ml/g; SD: 0.0030 ml/g)'* '" 
and sequence length was found to be inversely correlated to dn/dc. 
Comparison of proteins with double Greek key motifs shows a sim- 
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Figure 4 | The number of residues plotted for each respective protein from the selected datasets of py- and S-crystallins, showing in A) and C) 
proportion of Lysine (K) and Arginine (R) residues and in B) andD) the number of Glutamic Acid (E) and Aspartic Acid (D) residues. Py-crystallin K-R 
correlation: -0.85,p < 0.0005; Py-crystallin E-D correlation: -0.70,p < 0.0005; S-crystallin K-R correlation -0.92,p < 0.0005;S-crystallin E-D correlation:. 
-0.81,p < 0.0005. Open circles correspond to S-crystallins from octopus. 



ilar trend: that the double Greek key motif in crystallins has a higher 
dn/dc than those in other proteins. A Greek key motif is therefore not 
necessarily indicative of a high dn/dc value. Sequence length in Greek 
keys was not found to be correlated with dn/dc. The dn/dc values of 
the P-hairpins in the Greek keys from isoforms yB, PB2, PB3, PBl 
and PA4 are slightly higher than the dn/dc values of the whole 
sequence. Whether the amino acids in these structural regions have 
a dual role in contributing to refractive index and to structural 
stability requires further investigation. 

Tighter packing of proteins will increase the refractive index as 
proteins have a higher refractive index than water. The substitutions 
of arginine for lysine and aspartic acid for glutamic acid, not only 
increase the dn/dc value but could also result in a more compact 
protein structure. The guanidinium group on the side chain of argi- 
nine has a geometry and charge distribution that renders it able to 
form multiple hydrogen bonds; the shorter, less flexible side chain 
of aspartic acid compared to glutamic acid may also facilitate 
compaction. 

Both in the Py-crystallins and in the S-crystallins, the higher the 
dn/dc value of a protein, the fewer lysine and glutamic acid residues 
and the more arginine and aspartic acid residues it contains. Since 
lysine/arginine and glutamic/aspartic acid residues are associated 
with the formation of salt bridges'", the observed substitutions may 



be constrained by the need to maintain existing salt bridges. Analysis 
of representative Py crystaDin protein structures revealed that mam- 
malian Py-crystallins have a much higher proportion of salt bridges 
per domain than would be expected given the domain length of about 
90 residues (5 salt bridges compared to a standard of <2 salt bridges 
for a domain of that length""*). This pattern is not found among the 
three non-mammalian crystallins investigated in this study; these 
display only one salt bridge per domain on average. It is notable that 
salt bridges are more frequent in a-helical structures'"*, whereas Py- 
crystallins have a relatively high proportion of P-pleated sheet. 

A cross species comparison showed that arginine consistently 
replaced lysine with progression from lower to higher dn/dc value 
proteins within a species. The glutamic acid/aspartic acid correlation 
was not borne out in this comparison. Instead another trend was 
observed: a decrease in phenylalanine and a concomitant increase in 
tyrosine with increase in dn/dc value. As both phenylalanine and 
tyrosine have relatively high dn/dc values, (0.244 and 0.240 ml/g 
respectively) which are very close in magnitude, such a substitution 
makes little difference to the refractive index. 

The findings of this study show relatively higher numbers of salt 
bridges in mammalian Py-crystaUins than in other proteins'" and 
there is a predominance of salt bridges formed between amino acids 
separated by more than 10 residues. Additionally, the relatively high 



Table 1 I Refractive index increment of 


selected sequence 
XENLA 


DANRE 


MOUSE 


BOVINE 


HUMAN 


Beta-B2 crystallin 












(reference) 


(Q6DJC7) 


(Q52JI4) 


(P62696) 


(P02522) 


(P43320) 


Refractive increment 


0.1958 


0.1950 


0.1942 


0.1939 


0. 1 942 


Gamma crystallin with low dn/dc 


Gamma-E-crystallin 


GammaM5-crystallin 


Gamma-S-crystallin 


Gamma-S-crystallin 


Gamma-S-crystallin 


(reference) 


(Q6DJC9) 


(Q5XJ63) 


(035486) 


(P06504) 


(P22914) 


Refractive increment 


0.2001 


0.1974 


0.1981 


0.1988 


0.1984 


Gamma crystallin with high dn/dc 


Gamma-A-crystallin 


Gamma M2- crystallin 


Gamma-crystallin E 


Ga m ma-F-crysta II i n 


Gamma-A-crystallin 


(reference) 


(Q66KU7) 


(A7E2K8) 


(Q03740) 


(P23005) 


(P 11 844) 


Refractive increment 


0.2074 


0.2077 


0.2033 


0.2021 


0. 1 999 
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Table 2 


Interspecies comparison of amino acid 


substitutions 
















Species 




Low- > High y 






|3B2 -> Low y 






PB2- > 


High y 




K- > R 


R- > K F- > Y Y 


> F 


K- > R 


R- > K 
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Y- >F 


K- > R 


R- > K 


F- >Y 


Y- > F 


XENLA 
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0 6 
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DANRE 
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0 6 
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3 


1 
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0 


2 


0 


MOUSE 


3 


0 2 


0 


4 


0 
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2 


1 


BOVINE 


4 


0 3 


0 
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1 


1 


1 
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0 


3 


0 


HUMAN 


4 


0 3 


0 


2 


1 


2 


1 


4 


0 


3 


0 



proportion of interstrand bridges in the crystallins compared to what 
has been found in other proteins'"*" may be indicative of long range 
structural stability. This is of particular importance to the crystallins 
which remain in the cytoplasm of lens fibre cells from their synthesis 
to death of the organism. In the case of cells from the central regions 
of lenses, protein synthesis has taken place during gestation and 
maintenance of optical quality is required for decades. The highest 
content of y-crystallins is found in the central regions of mammalian 
lenses where refractive index reaches maximum magnitude''""'"; in 
cephalopods the predominant proteins which contribute to the high 
refractive index are S-crystallins''''*''^. Both protein classes have rela- 
tively high refractive increments. 



Whilst the optical function of the lens is to provide refractive 
power, the quality of the optics relies on transparency. Cataract 
results in a loss of transparency and it has recently been shown that 
congenital mutations in human yD-crystaUin can cause cataract to 
develop with or without disruption to the Greek key structure''". 
Mutations such as that which results in substitution of arginine by 
serine at position 77 (R77S) do not destabilise the tertiary structure 
yet result in cataract in the cortical regions of the lens''". Other muta- 
tions such as the one that leads to substitution of proline for alanine 
(A36P), disrupt the Greek key structure and cause nuclear cataract*". 
Single substitutions similar to the aforementioned do not produce 
any substantive change in the value of dn/dc. 
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Figure 5 | Salt bridge conservation in Py-crystallins. (A) Multiple alignment of Py-crystallin sequences including alignment of the two domains. Only 
residues relevant to conserved salt bridges and disulphide bond are displayed. Green and red numbers represent negatively and positively charged 
residues, respectively. Yellow dashed lines show disulphide bonds. Pale blue shaded residues are involved in cross domain salt bridges. Navy shaded 
residues are involved in completely conserved salt bridges^'; (B) inter protein disulphide bridges observed in a homodimer of crystallin PB2. Monomers 
are shown in different colours^^'; (C) inter domain salt bridges formed within a single crystalUn PBI monomer. Salt bridge interactions are shown as red 
dotted lines^^^. 
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The lenticular crystallins are organised to ensure that the lens 
meets the refractive demands of the eye. The py-crystallins and the 
S-crystallins contain residues that contribute to a high dn/dc when 
compared to non-lenticular proteins and the Greek key motif which 
exists in many proteins, is linked with a higher dn/dc only when it is 
found in crystallins. Salt bridge interactions that stabilise protein 
structure and provide interactive potential, are relevant to structural 
longevity of the crystallins and are necessary for maintenance of 
transparency over decades. The crystallins have not only evolved 
with a primary sequence that optimises their contribution to refrac- 
tion, they have higher order arrangements that are conducive to its 
preservation. 

Methods 

Analysis of py-crystallin refractive index increment. Refractive index increments 
(for 589 nm at 25' C) were predicted for all available sequences belonging to the Py- 
crystallin family. Those sequences were retrieved using relevant seed sequences as Psi- 
Blast queries*^ In order to ensure the widest coverage while avoiding the introduction 
of unrelated proteins, seeds were defmed as all sequences whose annotations, based 
on experimental data, specify the molecular function as "the action of a molecule that 
contributes to thestructuralintegrityofthelensofan eye" (GO:0005212*^) and identify 
them as belonging to the Py-crystallin family. As results, 1 3 seed sequences were used: 
7 from mouse - P-crystallin A1/A2/B2/S and y-crystallin B/C/D/E - and 6 from rat - 
p-crystallin A4/B1/B3 and y-crystallin C/D/E. 

All sequences returned by Psi-Blast with an e- value below 1 were mapped to 
UniReflOO^^ to remove duplicates and fragments. Since the Py-crystallin superfamily 
contains a few non-crystallin members, such as absent in melanoma 1 (AIMl)^, their 
associated sequences were removed from the initial list. This was performed by 
generating a phylogenetic tree using FastTree*^ from a multiple alignment*^ and 
eliminating sequences belonging to non-crystallin branches. Eventually, the diversity 
of the Py-crystallin family was represented by 292 entries. Prediction of their dn/dc 
values was performed following the computational method outlined by McMeekin 
et al.^^ and described in previous studies^*-^^. 

Analysis of S-crystallin refractive index increment. Refractive index increments (for 
589 nm at 25'^C) were predicted for relevant sequences belonging to the S-crystallin 
family. Those sequences were retrieved using a seed sequence - squid S-crystallin 
(P18426) - as Psi-Blast query^". All sequences returned with an e-value below 1 were 
candidates for further filtering. Sequences not belonging to the S-crystallin family 
such as its homologous glutathione S-transferases were discarded. Fragments, 
predicted and hypothetical sequences were also not considered. Finally, among the 38 
remaining S-crystallin sequences, only those belonging to cephalopod species were 
selected for our study, ie 24 from Squid and 8 from Octopus; 6 Oyster sequences were 
discarded. 

Analysis of Greek key sequences. Most py-crystallins contain a double Greek key 
motif that has a length of 170-180 residues. To study the constraint this motif confers 
on protein evolution, sequences outside that length range were removed from the 
dataset. Hence, 116 Py-crystallin sequences were used for analysis. Predicted dn/dc 
values of Py-crystallins were compared to those of a set of 52 double Greek key 
domain representatives^'', whose sequences were retrieved from the Protein Data 
Bank'^ 

Salt bridge analysis. Salt bridge analysis was performed on a set of 3D protein 
structures that are representative of Py-crystallins. These structures were extracted 
from the Protein Data Bank using a 90% sequence similarity filter to eliminate 
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AIMl proteins and a crystallin whose domains were artificially permuted'*^ were also 
removed from the list. A total of 19 structures were studied: 7 human, 5 cow, 3 mouse, 
1 rat, 1 sea squirt, 1 bacterium and 1 archaea. Descriptions of salt bridges were 
computed by the Salt Bridges Plugin of the molecular graphics program, VMD*^; their 
classification as interdomain and interstrand and calculation of P-hairpins was 
performed using descriptions produced by PROMOTIF^". 
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